Take Home Ex-1

Overview

Step 1: Load the required Packages

sf : Simple features for R. We import this package to help us read the aspatial and geospatial data.

tidyverse: This package help us to transform and better present the data.

tmap: We use this package to plot thematic maps

spdep: We use this package to help us obtain the spatial weights.

funModeling: We use this package to help us with EDA

here: Helps us generate a path to a specific directory on the root

Load the packages:

pacman::p_load(sf, tidyverse, tmap, spdep, funModeling, purrr, here)

Step 2: Import the Data

Import Nigeria water point data-file:

Generate a path:
We use the here function to generate a specific file path on the root folder.

shapefile_path <- here("data", "dataNigeria", "geospatial")
shapefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial"

Some useful link for the CRS:

st_read() belongs to the sf package. It reads simple features from file or database. Simple features or simple feature access refers to formal standard ISO 19125-1:2004 that describes how real world data can be represented in computers, with emphasis on the spatial geometry of these objects. Link below:

https://cran.r-project.org/web/packages/sf/vignettes/sf1.html#:~:text=Simple%20features%20or%20simple%20feature,spatial%20geometry%20of%20these%20objects.

To find the CRS of the shapefile, open the .prj file as a text. It will tell you which projection system is being used.

Read the shapefile using st_read() belonging to the sf package:
The data read will be saved as a simple feature data table.
We will use the filter() function of dplyr package to filter only rows for Nigeria

# wp <- st_read(
#   dsn = shapefile_path,
#   layer = "geo_export",
#   crs = 4326) %>%
#   filter(clean_coun == "Nigeria")

Generate the save path using here function:

savefile_path <- here("data", "dataNigeria", "geospatial", "wp_nga.rds")
savefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial/wp_nga.rds"

We will next save the file using write_rds() of the tidyverse package:
rds is a native data format of R.

# wp_ng <- write_rds(wp, savefile_path)

Import Nigeria geo-boundary file:

Next we will make the path to the geo boundary file:

shapefile_path <- here("data", "dataNigeria", "boundary")
shapefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/boundary"

Next we will Import the Nigeria LGA Boundary Data with st_read() function:
The imported data will be saved as a simple features dataset.

nga <- st_read(
  dsn = shapefile_path,
  layer = "geoBoundaries-NGA-ADM2",
  crs = 4326)
Reading layer `geoBoundaries-NGA-ADM2' from data source 
  `D:\f4sared\ISSS624\data\dataNigeria\boundary' using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

Downsize further the wp_nga data:

Load the previously saved data:
We will select specific columns using select().

# final <- read_rds(rdsfile_path) %>% select(1:2, 14:17, 23)

Create the path for saving the file

# savefile_path <- here("data", "dataNigeria", "geospatial", "wp_nga_v2.rds")
# savefile_path

Save the file:

# write_rds(final, savefile_path)

Step 3: Data wrangling

Visualize Initial distribution

Generate path to rds file saved previously:

rdsfile_path <- here("data", "dataNigeria", "geospatial","wp_nga_v2.rds")
rdsfile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial/wp_nga_v2.rds"

Load the rds file with read_rds() function of the tidyverse package:
We will also make use of the piping to replace the “na” values with “unknown”.
mutate() is a function of the dplyr package.

wp_nga <- read_rds(rdsfile_path) %>%
  mutate(status_cle = replace_na(status_cle, "Unknown"))

Check the CRS of the spatial datafile with st_crs():

st_crs(wp_nga)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Use the freq() of the funModeling package to show the distribution percentage of status_cle:

freq(data=wp_nga,
     input = 'status_cle')
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
  Please report the issue at <https://github.com/pablo14/funModeling/issues>.

                        status_cle frequency percentage cumulative_perc
1                       Functional     45883      48.29           48.29
2                   Non-Functional     29385      30.93           79.22
3                          Unknown     10656      11.22           90.44
4      Functional but needs repair      4579       4.82           95.26
5 Non-Functional due to dry season      2403       2.53           97.79
6        Functional but not in use      1686       1.77           99.56
7         Abandoned/Decommissioned       234       0.25           99.81
8                        Abandoned       175       0.18           99.99
9 Non functional due to dry season         7       0.01          100.00

Filter for functional water-points:

Here we will use the filter() function from the dplyr package to select “functional” rows only:
We use the %in% to denote the membership in the group of strings.

wpt_functional <- wp_nga %>%
  filter(status_cle %in%
           c("Functional",
             "Functional but not in use",
             "Functional but needs repair"))

Then we will plot with freq() function from funModeling to show the distribution:

freq(data=wpt_functional, 
     input = 'status_cle')

                   status_cle frequency percentage cumulative_perc
1                  Functional     45883      87.99           87.99
2 Functional but needs repair      4579       8.78           96.77
3   Functional but not in use      1686       3.23          100.00

Filter for non-functional

Filter for non-functional rows:
Use %in% for to select rows that fall into the specific categories.

wpt_nonfunctional <- wp_nga %>%
  filter(status_cle %in%
           c("Abandoned/Decommissioned", 
             "Abandoned",
             "Non-Functional",
             "Non functional due to dry season",
             "Non-Functional due to dry season"))

Plot the distribution with the freq() function:

freq(data=wpt_nonfunctional, 
     input = 'status_cle')

                        status_cle frequency percentage cumulative_perc
1                   Non-Functional     29385      91.25           91.25
2 Non-Functional due to dry season      2403       7.46           98.71
3         Abandoned/Decommissioned       234       0.73           99.44
4                        Abandoned       175       0.54           99.98
5 Non functional due to dry season         7       0.02          100.00

Filter for unknown:

Lastly we filter for the rows that have unknown status:

wpt_unknown <- wp_nga %>%
  filter(status_cle == "Unknown")

Perform data manipulation (Point-In-Polygon):

Using st_intersects, we will be able to create a list of rows from wp_nga that intersects each row of nga.
For the intersection to work, st_intersect will check if each point falls within the polygon of nga.
Next we use the lengths() function to count the number of instances. Then we append to a new column.

We repeat this step across all 3 categories of Functional, Non-Functional & Unknown:

nga_wp <- nga %>%
  mutate(`total wpt` = lengths(
    st_intersects(nga, wp_nga))) %>%
  mutate(`wpt functional` = lengths(
    st_intersects(nga, wpt_functional))) %>%
  mutate(`wpt non-functional` = lengths(
    st_intersects(nga, wpt_nonfunctional))) %>%
  mutate(`wpt unknown` = lengths(
    st_intersects(nga, wpt_unknown)))

Next, using the mutate() function of dplyr, we will create 2 new columns:

  • pct_functional = `wpt functional`/`total wpt`

  • pct_non-functional = `wpt non-functional`/`total wpt`

We will then use select() of dplyr to retain the fields that we require.

nga_wp <- nga_wp %>%
  mutate(pct_functional = `wpt functional`/`total wpt`) %>%
  mutate(`pct_non-functional` = `wpt non-functional`/`total wpt`) 

We will then create a save file path:

savefile_path <- here("data", "dataNigeria", "geospatial", "nga_wp.rds")
savefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial/nga_wp.rds"

Next we will save this final dataframe using write_rds() of tidyverse package:

write_rds(nga_wp, savefile_path)

Step 4: Plot distribution of water point types

Plot the distribution of the water points using qtm() package of tmap:
Here we will add the additional settings to better adjust the size of the legend.
This is done via the tm_layout() function.

nga_wp <- read_rds(savefile_path)

total <- qtm(nga_wp, fill = "total wpt") + 
    tm_layout(legend.height = 0.4,legend.width = 0.4)

wp_functional <- qtm(nga_wp, fill = "wpt functional") + 
    tm_layout(legend.height = 0.4,legend.width = 0.4)

wp_nonfunctional <- qtm(nga_wp, fill = "wpt non-functional") +
    tm_layout(legend.height = 0.4,legend.width = 0.4)

unknown <- qtm(nga_wp, fill = "wpt unknown") + 
    tm_layout(legend.height = 0.4,legend.width = 0.4)

tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=2, ncol=2)

Step 5: Transform to EPSG 26391

We will use the st_crs() function to check the current EPSG of the sf datafile:

st_crs(nga_wp)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Next, we will use st_transform() to project to the new crs 26391:

nga_wp_26391 <- st_transform(nga_wp, crs = 26391)

We then check the new CRS:

st_crs(nga_wp_26391)
Coordinate Reference System:
  User input: EPSG:26391 
  wkt:
PROJCRS["Minna / Nigeria West Belt",
    BASEGEOGCRS["Minna",
        DATUM["Minna",
            ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4263]],
    CONVERSION["Nigeria West Belt",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",4,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",4.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.99975,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",230738.26,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Nigeria - onshore west of 6°30'E, onshore and offshore shelf."],
        BBOX[3.57,2.69,13.9,6.5]],
    ID["EPSG",26391]]

Once again, we will plot the map to check if everything has been projected correctly:

total <- qtm(nga_wp_26391, fill = "total wpt") + 
    tm_layout(legend.height = 0.4,legend.width = 0.4)

wp_functional <- qtm(nga_wp_26391, fill = "wpt functional") + 
    tm_layout(legend.height = 0.4,legend.width = 0.4)

wp_nonfunctional <- qtm(nga_wp_26391, fill = "wpt non-functional") +
    tm_layout(legend.height = 0.4,legend.width = 0.4)

unknown <- qtm(nga_wp_26391, fill = "wpt unknown") + 
    tm_layout(legend.height = 0.4,legend.width = 0.4)

tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=2, ncol=2)

Step 6: Thematic mapping with qtm() & tmap()

Plotting with qtm()

Set the tmap mode to “view” for interactive mode. Else set to “plot”.

qtm() represents quick plotting with tmap package:

tmap_mode("plot")
tmap mode set to plotting
qtm(nga_wp_26391, 
    fill = "wpt non-functional")

Plotting with tmap()

Jenks:

Jenks: Identify groups with similar values and maximizes the difference between them.

Using interactive tmap(), we will plot using “equal” classification:

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(nga_wp_26391)+
  tm_fill(c("wpt non-functional", "wpt functional"),
          style = "jenks") +
  tm_borders(alpha = 0.5) +
  tmap_style("white")
tmap style set to "white"
other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor" 

Quantile:

Splits variables in quantiles, Consequently there are same number of observations in each interval.

Using interactive tmap(), we will plot using “quantile” classification:

tm_shape(nga_wp_26391)+
  tm_fill(c("wpt non-functional", "wpt functional"),
          style = "quantile") +
  tm_borders(alpha = 0.5) +
  tmap_style("white")
tmap style set to "white"
other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor" 

We will use bbox to get better visuals:
https://stackoverflow.com/questions/60892033/how-do-you-position-the-title-and-legend-in-tmap

tmap_mode("plot")
tmap mode set to plotting
# make some bbox magic
bbox_new <- st_bbox(nga_wp_26391)
xrange <- bbox_new$xmax - bbox_new$xmin # range of x values
yrange <- bbox_new$ymax - bbox_new$ymin # range of y values
# bbox_new[1] <- bbox_new[1] - (0.25 * xrange) # xmin - left
bbox_new[3] <- bbox_new[3] + (0.75 * xrange) # xmax - right
# bbox_new[2] <- bbox_new[2] - (0.25 * yrange) # ymin - bottom
bbox_new[4] <- bbox_new[4] + (0.01 * yrange) # ymax - top
bbox_new <- bbox_new %>%  # take the bounding box ...
  st_as_sfc() # ... and make it a sf polygon


tm_shape(nga_wp_26391, bbox = bbox_new)+
  tm_fill("wpt non-functional", 
          style = "quantile", 
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.5) +
  tm_borders(alpha = 0.5) + 
  tm_layout(main.title = "Quantile Plot of WPT Non-Functional",
            main.title.position = "center",
            main.title.size = 1,
            legend.position = c("right", "center"),
            frame = TRUE) + 
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.2, position= c("center", "bottom")) +
  tm_grid(lwd = 0.1, alpha = 0.2) 

tmap_mode("plot")
tmap mode set to plotting
# make some bbox magic
bbox_new <- st_bbox(nga_wp_26391)
xrange <- bbox_new$xmax - bbox_new$xmin # range of x values
yrange <- bbox_new$ymax - bbox_new$ymin # range of y values
# bbox_new[1] <- bbox_new[1] - (0.25 * xrange) # xmin - left
bbox_new[3] <- bbox_new[3] + (0.75 * xrange) # xmax - right
# bbox_new[2] <- bbox_new[2] - (0.25 * yrange) # ymin - bottom
bbox_new[4] <- bbox_new[4] + (0.01 * yrange) # ymax - top
bbox_new <- bbox_new %>%  # take the bounding box ...
  st_as_sfc() # ... and make it a sf polygon


tm_shape(nga_wp_26391, bbox = bbox_new)+
  tm_fill("wpt functional", 
          style = "quantile", 
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.5) +
  tm_borders(alpha = 0.5) + 
  tm_layout(main.title = "Quantile Plot of WPT Non-Functional",
            main.title.position = "center",
            main.title.size = 1,
            legend.position = c("right", "center"),
            frame = TRUE) + 
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.2, position= c("center", "bottom")) +
  tm_grid(lwd = 0.1, alpha = 0.2) 


Step 7: Prepare the Spatial Weights

Usually for geospatial weights preparation, there are 2 options:

  • Contiguity based weights

    • Useful when the polygons are of similar size
  • Distance based weights

    • Fixed distance

    • K-nearest neighbors (Adaptive distance)

    • Inversed-Distance

For our analysis, we will consider using the distance based weights because the polygons size is not spread evenly across the map of Nigeria.

Prepare the coordinates

In order to work with fixed distance weight matrix, we need to prepare the coordinates of the center of gravity (COG) for all the polygons.

To achieve this, we will need to pass the column “geometry” into the function st_centroid() from the sf package. This function will calculate the COG and return us the coordinates accordingly. We will also make use of the map_dbl() function from the purrr package to apply the st_centroid() function to row.

Longitude:

longitude <- map_dbl(nga_wp_26391$geometry, ~st_centroid(.x)[[1]])

Latitude:

latitude <- map_dbl(nga_wp_26391$geometry, ~st_centroid(.x)[[2]])

Next, we will use the function of cbind() from the base package to bind our coordinates together.

Bind long & lat:

coords <- cbind(longitude, latitude)

Check if output is correct:

head(coords)
     longitude  latitude
[1,]  549364.0  123694.9
[2,]  547123.4  120376.5
[3,] 1189496.9 1059770.9
[4,]  489057.4  534262.6
[5,]  593718.2  113824.1
[6,]  642618.7  251222.3

Determine cut off distance:

In order to ensure that each polygon has at least 1 neighbor, we need to determine the cut off distance. To do this, we will first run the k-nearest neighbor with k=1. This is achieved with the function of knearneigh() of the spdep package.

We run the code as follow:

k1 <- knn2nb(knearneigh(coords))

After obtaining k1, a list of 774 polygons where each row shows the nearest neighbor polygon ID, we next need to calculate the distance between all of them using nbdists() of spdep.

Next, in order to calculate the summary, we need then to unlist() the output of nbdists(). This is then followed by using the summary() function which reports to us the max distance for us to use.

We run the code as follow:

k1dists <- unlist(nbdists(k1, coords, longlat = FALSE))
summary(k1dists)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2669   12834   20304   22084   27783   72139 

From the above summary, it shows that the minimum distance for every polygon to have a neighbor is 71.661, thus we will use the value 72140.

Compute the fixed distance neighbor list:

With the above information, we will then compute the fixed distance weight matrix using dnearneigh() function from spdep. For this to work, we will need the coordinate of the polygons, min dist & max dist.

We will run the code as follow, this will give us a nb (neighbor) object data wm_d72:

nb_d72 <- dnearneigh(coords, 0, 72140, longlat = FALSE)
nb_d72
Neighbour list object:
Number of regions: 774 
Number of nonzero links: 18130 
Percentage nonzero weights: 3.026327 
Average number of links: 23.42377 

Next, we will display our results for viewing:

str(nb_d72)
List of 774
 $ : int [1:63] 2 5 10 25 55 66 68 103 122 181 ...
 $ : int [1:62] 1 5 10 25 55 66 68 103 122 181 ...
 $ : int [1:2] 261 447
 $ : int [1:10] 12 20 257 263 446 454 466 641 690 695
 $ : int [1:56] 1 2 55 66 104 136 137 169 184 202 ...
 $ : int [1:21] 9 14 18 19 56 170 217 218 330 337 ...
 $ : int [1:19] 8 15 22 176 177 214 281 282 283 295 ...
 $ : int [1:32] 7 15 22 49 176 177 214 275 276 277 ...
 $ : int [1:26] 6 18 19 56 66 77 103 104 217 218 ...
 $ : int [1:64] 1 2 23 25 66 103 181 190 191 203 ...
 $ : int [1:22] 26 27 43 68 126 157 190 191 204 336 ...
 $ : int [1:11] 4 135 257 263 401 417 429 446 454 690 ...
 $ : int [1:13] 31 37 38 40 94 211 320 393 436 471 ...
 $ : int [1:24] 6 170 193 194 195 217 309 310 311 362 ...
 $ : int [1:27] 7 8 22 32 49 51 62 82 176 177 ...
 $ : int [1:37] 30 38 39 41 44 45 70 71 120 124 ...
 $ : int [1:34] 28 29 35 72 172 173 178 179 182 275 ...
 $ : int [1:29] 6 9 19 56 66 77 103 104 217 218 ...
 $ : int [1:41] 6 9 18 25 56 66 77 103 104 181 ...
 $ : int [1:7] 4 106 239 263 419 454 466
 $ : int [1:9] 60 61 162 269 484 520 578 596 626
 $ : int [1:31] 7 8 15 32 49 51 62 82 176 177 ...
 $ : int [1:64] 10 25 52 53 54 56 58 77 78 79 ...
 $ : int [1:5] 123 476 527 673 761
 $ : int [1:68] 1 2 10 19 23 54 56 66 77 103 ...
 $ : int [1:30] 11 27 43 68 157 190 191 204 336 370 ...
 $ : int [1:24] 11 26 43 68 157 191 204 336 370 371 ...
 $ : int [1:43] 17 29 35 70 71 124 172 173 178 179 ...
 $ : int [1:45] 17 28 35 70 71 124 172 173 178 179 ...
 $ : int [1:30] 16 38 39 40 41 44 45 175 185 186 ...
 $ : int [1:13] 13 37 94 158 210 211 212 289 308 561 ...
 $ : int [1:28] 15 22 49 51 62 82 177 196 207 214 ...
 $ : int [1:29] 47 111 130 142 145 155 166 219 227 233 ...
 $ : int [1:11] 42 86 104 136 137 213 375 553 559 733 ...
 $ : int [1:33] 17 28 29 72 172 173 178 179 182 275 ...
 $ : int [1:8] 50 107 247 408 432 455 681 759
 $ : int [1:21] 13 31 38 39 40 41 186 192 197 198 ...
 $ : int [1:25] 13 16 30 37 39 40 41 44 186 192 ...
 $ : int [1:27] 16 30 37 38 40 41 44 185 186 192 ...
 $ : int [1:21] 13 30 37 38 39 41 44 186 192 211 ...
 $ : int [1:22] 16 30 37 38 39 40 44 45 186 192 ...
 $ : int [1:20] 34 86 136 137 184 202 285 286 375 499 ...
 $ : int [1:19] 11 26 27 68 122 126 157 190 191 246 ...
 $ : int [1:27] 16 30 38 39 40 41 45 70 175 186 ...
 $ : int [1:27] 16 30 41 44 70 175 187 188 192 290 ...
 $ : int [1:12] 119 380 387 417 423 429 438 459 521 656 ...
 $ : int [1:24] 33 111 127 130 155 166 227 234 238 242 ...
 $ : int [1:12] 64 65 74 113 131 265 386 407 428 482 ...
 $ : int [1:30] 8 15 22 32 51 62 82 176 177 207 ...
 $ : int [1:4] 36 107 409 432
 $ : int [1:27] 15 22 32 49 62 82 177 207 214 284 ...
 $ : int [1:47] 23 53 54 57 58 77 78 79 80 165 ...
 $ : int [1:37] 23 52 54 57 58 78 79 80 165 189 ...
 $ : int [1:58] 23 25 52 53 56 57 58 77 78 79 ...
 $ : int [1:33] 1 2 5 68 122 157 169 184 190 208 ...
 $ : int [1:51] 6 9 18 19 23 25 54 66 77 78 ...
 $ : int [1:35] 52 53 54 58 78 79 80 165 189 197 ...
 $ : int [1:37] 23 52 53 54 57 78 79 165 189 197 ...
 $ : int [1:5] 128 129 493 700 748
 $ : int [1:14] 21 61 158 269 310 311 561 563 578 589 ...
 $ : int [1:11] 21 60 162 268 269 484 578 589 592 596 ...
 $ : int [1:28] 15 22 32 49 51 82 177 196 207 214 ...
 $ : int [1:5] 384 416 467 765 772
 $ : int [1:7] 48 65 74 113 131 265 407
 $ : int [1:11] 48 64 74 109 113 265 386 407 683 701 ...
 $ : int [1:47] 1 2 5 9 10 18 19 25 56 103 ...
 $ : int [1:26] 72 120 124 179 182 304 305 339 346 347 ...
 $ : int [1:30] 1 2 11 26 27 43 55 122 157 190 ...
 $ : int [1:8] 140 146 248 274 473 500 512 513
 $ : int [1:44] 16 28 29 44 45 71 120 124 172 173 ...
 $ : int [1:50] 16 28 29 70 120 124 172 173 175 178 ...
 $ : int [1:19] 17 35 67 182 361 374 378 404 566 567 ...
 $ : int [1:6] 361 374 377 404 665 666
 $ : int [1:14] 48 64 65 109 113 116 251 265 672 683 ...
 $ : int [1:15] 110 229 255 258 272 373 382 398 422 433 ...
 $ : int [1:9] 254 287 427 459 470 547 647 677 751
 $ : int [1:55] 9 18 19 23 25 52 54 56 78 79 ...
 $ : int [1:51] 23 52 53 54 56 57 58 77 79 80 ...
 $ : int [1:57] 23 52 53 54 56 57 58 77 78 80 ...
 $ : int [1:39] 23 52 53 54 57 77 78 79 165 189 ...
 $ : int [1:19] 99 145 227 233 242 255 270 426 449 483 ...
 $ : int [1:21] 15 22 32 49 51 62 177 207 214 297 ...
 $ : int [1:6] 132 258 383 414 529 767
 $ : int [1:3] 148 437 692
 $ : int [1:38] 101 105 130 142 145 155 156 219 235 242 ...
 $ : int [1:17] 34 42 136 137 184 202 285 286 499 538 ...
 $ : int [1:19] 147 149 151 221 226 245 267 399 410 415 ...
 $ : int [1:5] 150 489 648 700 714
 $ : int [1:11] 100 107 159 260 408 463 542 674 676 681 ...
 $ : int 237
 $ : int [1:3] 160 271 406
 $ : int [1:11] 95 119 390 391 392 423 487 642 656 668 ...
 $ : int [1:3] 354 607 665
 $ : int [1:7] 13 31 158 436 561 596 709
 $ : int [1:10] 92 390 391 392 405 423 469 656 708 770
 $ : int [1:17] 97 108 139 167 168 350 389 403 412 420 ...
 $ : int [1:13] 96 108 114 139 147 168 389 403 420 451 ...
 $ : int [1:4] 153 231 432 696
 $ : int [1:18] 81 145 154 167 227 233 255 270 426 449 ...
  [list output truncated]
 - attr(*, "class")= chr "nb"
 - attr(*, "region.id")= chr [1:774] "1" "2" "3" "4" ...
 - attr(*, "call")= language dnearneigh(x = coords, d1 = 0, d2 = 72140, longlat = FALSE)
 - attr(*, "dnn")= num [1:2] 0 72140
 - attr(*, "bounds")= chr [1:2] "GE" "LE"
 - attr(*, "nbtype")= chr "distance"
 - attr(*, "sym")= logi TRUE

Plot the fixed distance neighbor list:

Using the plot() function, we will then visualize the neighbors that we have identified.

plot(nga_wp_26391$geometry, border="black", axes = TRUE)
plot(nb_d72, coords, add=TRUE, col="red")

Step 8: Plotting Spatial Lag

Create row standardized weight matrix

We will use the nb2listw() function of spdep package to convert the nb list to weights. We set the style to “W” in order to perform the row-standardized steps.

We run the code as follow:

swm_d72<- nb2listw(nb_d72, style="W", zero.policy = TRUE)

Compute spatial lag with row-standardized weights

Now we will use the lag.listw() function from spdep to return us the lag list. As an input, we will provide the row-standardized weight matrix and the variables “wpt non-functional” & wpt functional.

Lag for non-functional

We run the code as follow:

lag.list <- list(nga_wp_26391$shapeName, lag.listw(swm_d72, nga_wp_26391$`wpt non-functional`))
lag.res_NF <- as.data.frame(lag.list)
colnames(lag.res_NF) <- c("shapeName", "lag_NF")
nga_wp_26391$lag_NF <- lag.res_NF$lag_NF

We will then plot the lagged variable:

non_func <- qtm(nga_wp_26391, "wpt non-functional") +
tm_layout(main.title = "WPT Non-Functional",
            main.title.position = "center",
            main.title.size = 1,
            legend.position = c("right", "bottom"),
            legend.width = 0.3,
            legend.height = 0.25,
            frame = TRUE)
lag_non_func <- qtm(nga_wp_26391, "lag_NF")+
tm_layout(main.title = "Lagged WPT Non-Functional",
            main.title.position = "center",
            main.title.size = 1,
            legend.position = c("right", "bottom"),
            legend.width = 0.3,
            legend.height = 0.25,
            frame = TRUE)
tmap_arrange(non_func, lag_non_func, asp=1, ncol=2)

Lag for functional

We run the code as follow:

lag.list <- list(nga_wp_26391$shapeName, lag.listw(swm_d72, nga_wp_26391$`wpt functional`))
lag.res_F <- as.data.frame(lag.list)
colnames(lag.res_F) <- c("shapeName", "lag_F")
nga_wp_26391$lag_F <- lag.res_F$lag_F
non_func <- qtm(nga_wp_26391, "wpt functional") +
tm_layout(main.title = "WPT Functional",
            main.title.position = "center",
            main.title.size = 1,
            legend.position = c("right", "bottom"),
            legend.width = 0.3,
            legend.height = 0.25,
            frame = TRUE)
lag_non_func <- qtm(nga_wp_26391, "lag_F")+
tm_layout(main.title = "Lagged WPT Functional",
            main.title.position = "center",
            main.title.size = 1,
            legend.position = c("right", "bottom"),
            legend.width = 0.3,
            legend.height = 0.25,
            frame = TRUE)
tmap_arrange(non_func, lag_non_func, asp=1, ncol=2)

Step 9: Measure Global Spatial Auto Correlation

Global Moran’s I

To get an indication of the global spatial clustering auto-correlation, we will perform the Moran’s I test. This is an analytic approach.

WPT Non-Functional Moran’s I test:

moran.test(nga_wp_26391$`wpt non-functional`,
           listw=swm_d72,
           zero.policy = TRUE,
           na.action=na.omit)

    Moran I test under randomisation

data:  nga_wp_26391$`wpt non-functional`  
weights: swm_d72    

Moran I statistic standard deviate = 22.437, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.3264059335     -0.0012936611      0.0002133094 

The global Moran’s I statistic is >> 0 and the p-value is statistically significant. This shows that there is indication of clustering of the non-functional water point.


WPT Functional Moran’s I test:

moran.test(nga_wp_26391$`wpt functional`,
           listw=swm_d72,
           zero.policy = TRUE,
           na.action=na.omit)

    Moran I test under randomisation

data:  nga_wp_26391$`wpt functional`  
weights: swm_d72    

Moran I statistic standard deviate = 34.908, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.5041687788     -0.0012936611      0.0002096643 

The global Moran’s I statistic is >> 0 and the p-value is statistically significant. This shows that there is indication of clustering of the functional water point.


Monte Carlo Moran’s I

While the Moran’s I test is fast since it works analytically, we will need to perform Monte Carlo Moran’s I. This method allows us to perform simulation by generating many random datasets across multiple simulations.

Our Moran’s I should fall to either extremes of the Moran’s I histogram from the simulation. This shows that our Moran’s I value did not occur because of randomization.

For the Monte Carlo Moran’s I test, we will use the moran.mc() function from the spdep package.

WPT Non-Functional Moran’s I test:

set.seed(1234)
bperm_NF= moran.mc(nga_wp_26391$`wpt non-functional`, 
                listw=swm_d72, 
                nsim=999, 
                zero.policy = TRUE, 
                na.action=na.omit)
bperm_NF

    Monte-Carlo simulation of Moran I

data:  nga_wp_26391$`wpt non-functional` 
weights: swm_d72  
number of simulations + 1: 1000 

statistic = 0.32641, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Here we can see that our results are statistically significant with p value of 0.001. Also the Moran’s I calculated is similar to the analytic test.

We will visualize the simulation results from Monte Carlo as follow:

hist(bperm_NF$res, 
     freq=TRUE, 
     breaks=20, 
     xlab="Simulated Moran's I for Non-Functional")
abline(v=0, 
       col="red") 


WPT Functional Moran’s I test:

set.seed(1234)
bperm_F= moran.mc(nga_wp_26391$`wpt functional`, 
                listw=swm_d72, 
                nsim=999, 
                zero.policy = TRUE, 
                na.action=na.omit)
bperm_F

    Monte-Carlo simulation of Moran I

data:  nga_wp_26391$`wpt functional` 
weights: swm_d72  
number of simulations + 1: 1000 

statistic = 0.50417, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Here we can see that our results are statistically significant with p value of 0.001. Also the Moran’s I calculated is similar to the analytic test.

We will visualize the simulation results from Monte Carlo as follow:

hist(bperm_F$res, 
     freq=TRUE, 
     breaks=20, 
     xlab="Simulated Moran's I for Non-Functional")
abline(v=0, 
       col="red") 

Step 10: Correlogram

In the earlier section, we did the computation for Moran’s I and Monte Carlo Moran’s I. Both of them are a global measure of spatial autocorrelation. For both the non-functional and functional water point, we observe the presence of spatial correlation.

Another way for us to examine the spatial autocorrelation is by using correlogram. These diagram provide useful insights to how the Moran’s I or Geary’s C changes as the spatial lag/distance increase.

With the increase of spatial lad / distance, more neighbors are now included within the weight matrix. This then have further impact on the measure of spatial correlation.

For this section, we will use sp.correlogram() of spdep package. This function will require the neighbor object that we have calculated precious in step 7. We also need to state the style input. We will choose “W” for row-standardization.

We run the correlogram for the non-functional water point as follow:

MI_corr_NF <- sp.correlogram(nb_d72, 
                          nga_wp_26391$`wpt non-functional`, 
                          order=6, 
                          method="I", 
                          style="W",
                          zero.policy=TRUE)
plot(MI_corr_NF)

We check the output for statistical significance:

print(MI_corr_NF)
Spatial correlogram for nga_wp_26391$`wpt non-functional` 
method: Moran's I
           estimate expectation    variance standard deviate Pr(I) two sided
1 (774)  3.2641e-01 -1.2937e-03  2.1331e-04          22.4373       < 2.2e-16
2 (772)  1.7632e-01 -1.2970e-03  1.0426e-04          17.3957       < 2.2e-16
3 (772)  7.5578e-02 -1.2970e-03  7.8364e-05           8.6841       < 2.2e-16
4 (772) -8.3811e-03 -1.2970e-03  7.0841e-05          -0.8417             0.4
5 (772) -7.8042e-02 -1.2970e-03  6.4305e-05          -9.5703       < 2.2e-16
6 (772) -5.8045e-02 -1.2970e-03  5.6307e-05          -7.5626       3.952e-14
           
1 (774) ***
2 (772) ***
3 (772) ***
4 (772)    
5 (772) ***
6 (772) ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the non-functional water points, our Moran’s I decrease with increase lag order. The p values for the first 3 rows are statistically significant.


We run the correlogram for the functional water point as follow:

MI_corr_F <- sp.correlogram(nb_d72, 
                          nga_wp_26391$`wpt functional`, 
                          order=10, 
                          method="I", 
                          style="W",
                          zero.policy=TRUE)
plot(MI_corr_F)

We check the output for statistical significance:

print(MI_corr_F)
Spatial correlogram for nga_wp_26391$`wpt functional` 
method: Moran's I
            estimate expectation    variance standard deviate Pr(I) two sided
1 (774)   5.0417e-01 -1.2937e-03  2.0966e-04          34.9081         < 2e-16
2 (772)   3.8586e-01 -1.2970e-03  1.0247e-04          38.2455         < 2e-16
3 (772)   3.2289e-01 -1.2970e-03  7.7024e-05          36.9384         < 2e-16
4 (772)   2.3024e-01 -1.2970e-03  6.9640e-05          27.7456         < 2e-16
5 (772)   1.3859e-01 -1.2970e-03  6.3217e-05          17.5942         < 2e-16
6 (772)   6.0259e-02 -1.2970e-03  5.5351e-05           8.2738         < 2e-16
7 (772)  -1.4690e-02 -1.2970e-03  4.9489e-05          -1.9038         0.05694
8 (772)  -1.0019e-01 -1.2970e-03  4.8999e-05         -14.1278         < 2e-16
9 (772)  -1.5706e-01 -1.2970e-03  5.4069e-05         -21.1829         < 2e-16
10 (772) -1.7121e-01 -1.2970e-03  5.9742e-05         -21.9828         < 2e-16
            
1 (774)  ***
2 (772)  ***
3 (772)  ***
4 (772)  ***
5 (772)  ***
6 (772)  ***
7 (772)  .  
8 (772)  ***
9 (772)  ***
10 (772) ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the functional water points, our Moran’s I decrease more slowly lag order. The p values for the first 6 rows are statistically significant.


Plan: